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Atrial fibrillation (AF) is characterized by complex and irregular propagation patterns, and 
AF onset locations and drivers responsible for its perpetuation are the main targets 
for ablation procedures. ECG imaging (ECGI) has been demonstrated as a promising 
tool to identify AF drivers and guide ablation procedures, being able to reconstruct the 
electrophysiological activity on the heart surface by using a non-invasive recording of 
body surface potentials (BSP). However, the inverse problem of ECGI is ill-posed, and 
it requires accurate mathematical modeling of both atria and torso, mainly from CT or 
MR images. Several deep learning-based methods have been proposed to detect AF, 
but most of the AF-based studies do not include the estimation of ablation targets. In 
this study, we propose to model the location of AF drivers from BSP as a supervised 
classification problem using convolutional neural networks (CNN). Accuracy in the test set 
ranged between 0.75 (SNR = 5 dB) and 0.93 (SNR = 20 dB upward) when assuming time 
independence, but it worsened to 0.52 or lower when dividing AF models into blocks. 
Therefore, CNN could be a robust method that could help to non-invasively identify target 
regions for ablation in AF by using body surface potential mapping, avoiding the use 
of ECGI. 


Keywords: atrial fibrillation, body surface potentials, driver position, convolutional neural networks, deep learning 


1. INTRODUCTION 


Atrial fibrillation (AF) is the most common type of arrhythmia in clinical practice, affecting more 
than 33 million patients in the world (Chugh et al., 2014). AF is also a condition that increases 
the risk of the patients to suffer embolism, cardiac failure, stroke, and in the worst of cases, death 
(Fuster et al., 2006). 

Therefore, one of the clinical goals in AF patients is to restore sinus rhythm. This objective 
can be achieved by pharmacological treatment (Lip and Tse, 2007), but termination of arrhythmic 
processes is usually accomplished by ablation of the cardiac tissue. Main targets of ablation are AF 
onset locations and drivers responsible for AF perpetuation (Guillem et al., 2016). 

Previous human in vivo research showed different strategies to locate AF drivers and guide 
pulmonary veins isolation (PVI). In the case of invasive mapping procedures (Narayan et al., 
2012; Krummen et al., 2017; Navara et al., 2018), several catheters are introduced inside the atrial 
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chambers to record from 8 to 128 simultaneous electrograms 
(EGM) (Narayan et al., 2012). Despite the number of intracardiac 
signals, the large distance between catheter sensors (up to 1- 
2 cm) and the complex atrial anatomy limits the capability of 
intracardiac mapping systems to characterize the global electrical 
activity in AF (Oesterlein et al., 2016). Non-invasive procedures 
based on ECG imaging (ECGI) have been also tested to guide PVI 
(Haissaguerre et al., 2013, 2014; Dubois et al., 2015). However, 
PVI success rate gets lower in persistent AF, and phase singularity 
(PS)-guided ablation is suggested to be a reliable alternative for 
these more complicated cases (De Greef et al., 2018; Rottner et al., 
2020). 

Although ECGI has not been validated during AF propagation 
patterns, it has been demonstrated as a promising tool to identify 
AF drivers and guide ablation procedures (Cuculich et al., 2010; 
Haissaguerre et al., 2013, 2014; Pedrén-Torrecilla et al., 2016; 
Rodrigo et al., 2016). ECGI combines both numerical modeling 
of the bioelectric properties of the thorax and signal processing, 
with the aim of reconstructing the electrophysiological activity 
on the heart surface by using a non-invasive recording of body 
surface potentials (BSP) (Brooks and Macleod, 1997; Gulrajani, 
1998). However, the inverse problem of ECGI has several 
drawbacks. First, it requires an accurate mathematical modeling 
of both atria and torso, mainly from CT or MR images. Next, 
the inverse problem of ECGI is ill-posed because the propagation 
between the epicardium and the torso implies information loss 
due to signal attenuation (Rodrigo et al., 2014), and BSP are also 
blurred compared to the signals on the heart due to the laws of 
electromagnetic field theory. Therefore, regularization methods 
are needed to obtain reliable and stable epicardial potential 
reconstructions (Tikhonov and Arsenin, 1977; Oster and Rudy, 
1992; MacLeod and Brooks, 1998; Pedron-Torrecilla et al., 2011; 
Milanic et al., 2014). For these reasons, inverse problem-based 
approaches still need further improvement. 

In the last decade, machine learning (ML) and deep learning 
(DL) techniques have undergone considerable development in 
bioengineering, and this include novel research in AF. For 
example, DL has been used in AF detection by using recurrent 
neural networks (RNN) and convolutional neural networks 
(CNN) (Xiong et al., 2017), by STFT, stationary wavelet transform 
and CNN (Xia et al., 2018), and in the detection of individuals 
at risk of suffering from Paroxysmal AF by CNN (Pourbabaee 
et al., 2018). However, most of AF-based studies do not 
include the estimation of ablation targets. Nonetheless, recent 
research showed that ML and DL methods can be also used 
in more complex tasks, like heart surface potentials estimation 
from BSP using autoencoders (Bujnarowski et al., 2020) and 
rotor identification from 12-lead ECG using decision trees 
(Luongo et al., 2021). 

Therefore, in this study we propose to model the location of 
AF drivers from BSP as a supervised classification problem. We 
used CNN, which accounts for spatial characteristics, to address 
the location of AF drivers from previously labeled realistic 
computerized AF models (Figuera et al., 2016; Camara-Vazquez 
et al., 2020). 

The remaining of the study is organized as follows. In Section 
2, we introduce the computational models used for this study, the 
experimental set-up, performance metrics, and DL architecture. 


Final results are summarized in sections 3 and in section 4 main 
conclusions are presented. 


2. MATERIALS AND METHODS 


2.1. Computerized Models 
Geometric Models and EGM Computation 

Realistic 3D model for the atrial anatomy was composed of 
284,578 nodes and 1,353,783 tetrahedrons (673.4 + 130.3um 
between nodes) (Déssel et al., 2012) that considers a simplified 
single endocardium-epicardium layer for the atrial tissue. 

We simulated 13 different AF propagation patterns in both left 
atria (LA) and right atria (RA), with different complexity and 
driver positions: posterior left atrial wall (PLAW), left inferior 
pulmonary vein (LIPV), left superior pulmonary vein (LSPV), 
right inferior pulmonary vein (RIPV), right superior pulmonary 
vein (RSPV), right atrial appendage (RAA), and right atria free 
wall (RAFW). Sampling rate of the signals was fs = 500Hz, while 
their duration ranged from 2 to 5s. 

The final computerized models were comprised of N = 2,048 
nodes for atria, and M between 2,206 and 3,970 nodes for torsos 
(10 different geometries were used), under the assumption of a 
homogeneous, unbounded, and quasi-static conducting medium 
by summing up all effective dipole contributions over the entire 
model (Garcia-Molla et al., 2014; Figuera et al., 2016; Pedrón- 
Torrecilla et al., 2016; Rodrigo et al., 2017). Therefore, the EGMs 
of the entire model were computed as: 


5 F\ > 
v= (4) Yv (1) 


> 
7 


where V(F ) is the EGM signal at the measuring point, Vm is 
the transmembrane potential distribution across the atria, 7 is 
the distance vector between the measuring point and a point 
in the tissue domain, and r is its corresponding scalar distance. 
The transmembrane potentials were defined in a scattered 3D 
mesh, so the gradient was computed by interpolating a quadratic 
function that involves two surrounding points (Lawson, 1984): 


Vini— Vinj = cyxteryt+co3z+c4x? besy" +62" +c7xy-+egyz+coxz 
(2) 


where Vm, and Vm j are the transmembrane potentials at the 
points i and j; x, y, z are the incremental Cartesian coordinates 
from j to i, and coefficients cj to cy were computed by the 
least square method in, at least, nine neighboring points of 
each location. 

Atrial cell model 

The atrial cell model used is based on the one proposed in 
Nygren et al. (1998) and Koivumäki et al. (2014), where the 
electrical activity of a single myocyte is described in terms of their 
transmembrane potentials and ionic currents: 


ƏV _ Fion (3) 
3t Chr 
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FIGURE 1 | Position of the electrodes for body surface potential mapping (BSPM, left) and regions in atria where AF drivers can be found (right). 
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where V is the transmembrane potential, Ion is the 
transmembrane ionic currents, and Cm is the cell 
membrane capacitance. 

Then, the electrical propagation across the atrial tissue was 
simulated by including the transmembrane currents caused by 
the intercellular Gap junction current (due to the transmembrane 
potential gradient) in the previous formulation: 


N 
OVE Tion Vk — Vi 
— =- -NŅ Deg 4 
ra 2 si- (4) 


where Vx is the transmembrane potential at node k, V; is 
the transmembrane potential at the neighbor node i, Dķi 
is the diffusion coefficient between the node k and i, and 
d,; is the distance between those nodes. Since the atrial 
electrical conduction is anisotropic (its velocity is higher in the 
longitudinal fiber orientation than at transverse), the diffusion 
coefficient Dg; that modulates the intercellular ionic current is 
determined as follows: 


Dx = Diong E cos’a + Dirans * sina (5) 


where a is the angle between the longitudinal fiber orientation 
and the vector linking nodes k and i, and Diong and Dirans are 
the longitudinal and transverse diffusion coefficients, respectively 
(Rodrigo, 2016). Fibrotic and scar tissues were modeled by setting 
the diffusion values of the involved nodes to 0. In the case 
of the fibrotic tissue, a certain percentage of random nodes 
were disconnected, depending on the pattern to simulate. The 
final system of differential equations was solved by Runge-Kutta 
integration (using NVIDIA Tesla C2075 6G). 


2.2. Input Data 

2.2.1. Atrial Fibrillation Driver Detection as a ML 
Classification Problem 

We proposed to address the location of AF drivers as a supervised 
classification problem. We divided the atria into seven regions 
(Figure 1, right) where the AF driver can be found (Haissaguerre 
et al., 2014). Each region, which ranged from 1 to 7, represents 
a class to which the AF driver belongs. To obtain labeled data, 


AF driver location from the computerized model was manually 
classified into one region for each time-instant. An additional 
label (0) is assigned when no driver is found. 


2.2.2. Body Surface Potentials (BSP) 

Input data for DL algorithms are simulated BSP from each of 
the 10 torso geometries available. To compute the initial BSP 
y, the forward problem of ECGI was solved by computing the 
corresponding M x N transfer matrix A using the boundary 
element method (Barr et al., 1977; De Munck, 1992; Pedrón- 
Torrecilla et al., 2016): 


yt = Axt (6) 


where y; are the BSP from each time instants. Those simulated 
BSP must be referenced to the Wilson Central Terminal (WCT). 
This step is essential since real ECG recordings are referenced to 
this point due to the electrical noise of the ground, and it can be 
mathematically represented as: 


ECG = ECG notre — ECGwer (7) 


where ECGrotrer is the not-corrected ECG, ECGwcr is the WCT 
signal, computed as the average of the ECG signals at the WCT 
points, and ECG is the final corrected ECG. Therefore, if we 
apply the same methodology to BSP, y; referenced signals can be 
computed as: 


1 
Ytref = AXt— N >, Axt = Ax;— Noe Mwernt = (8) 
WCT newer WCT 
1 
=A- Mwcr ) xt (9) 
Nwcr 


where Nwcr is the number of WCT points, and Mwcr is a 
matrix of zeros except the rows that correspond to the WCT 
leads, that have the same values as the corresponding rows of 
the A matrix (Rodrigo, 2016). Therefore, to directly compute the 
BSP referenced to the WCT, it is possible to compute a corrected 
Actrw matrix as: 


1 
Nwcr 


Awcr =A- Mwcr (10) 
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FIGURE 2 | Construction of the different tensor layouts. Above: 3-channel tensor. Below: 1-channel tensor. 
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Once the WCT-referenced simulated BSP were computed, they 
were corrupted with additive Gaussian noise and filtered using: 
fourth-order bandpass Butterworth filter (fc; = 3 Hz and fcz 
= 30 Hz) (Figuera et al., 2016; Pedrén-Torrecilla et al., 2016). 
Signal-to-noise ratios (SNR) ranged from 5 to 50 dB. Finally, a 
set of 64 electrodes from each torso were chosen to represent a 
realistic multi-electrode vest used in electrophysiological studies, 
as shown in Figure 1 (left). 


2.2.3. Image Representation of BSP 

To obtain final input data, we represented the layout of electrodes 
in Figure 1 as images. For this purpose, we build two different 
types of tensors for each time instant (see Figure 2): 


e 3-channel tensor. This first approach consists on creating 3D 
matrices of shape (6 x 4 x 3). The first and last channel 
contain the BSP of the torso and back, respectively (24 
electrodes each), whereas the second channel contains BSP 
from the sides (16 electrodes distributed on the last 4 rows, 
while the rest are filled with zeroes). 

e 1-channel tensor. This second approach consists on organizing 
the BSP in a single-channel 2D matrix of shape (6 x 16 
x 1). To do that, we considered a multielectrode vest as a 
cylinder that was unrolled to a 12-column flat layout. Then, 
we added two additional mirror columns to each side of the 
matrix to represent the fact that the first and last columns 
of the tensor are in touch in a real 3D geometry. Finally, 
since sides electrodes are four for each column (instead of 
6), the empty values are filled with the mean of the three 
nearest electrodes. 


Finally, to increase the size of the images, we performed a bilinear 
interpolation to create tensors with shape (150 x 152 x 3) and 
(78 x 192 x 1), respectively. 


2.3. Convolutional Neural Network (CNN) 


Architectures 
2.3.1. Custom CNN 
CNN are a type of DL algorithm used mainly in image 
recognition and image classification. Comparing with other 
types of DL algorithms, this approach has a superior ability 
to extract features, increasing classification performance 
(Krizhevsky et al., 2017). 

CNN-based architectures consist on the following layers: 


e Input layer. In this case, input data are an image, i.e., a 3D 
matrix. This input matrix is also named as tensor. 

e Convolutional layer. In this type of layer, a filter of size 
(s,f) is applied to the input tensor. The main aim of the 
convolutional layer is to extract feature maps that can reflect 
certain characteristics of interest (edges, shape detection, 
etc.) (Li et al., 2019). This filtering process is performed 
using the convolution operation, which can be mathematically 
represented as: 


(11) 


§ t 
aij =f 2 5 Wm,nXi+mj+n + ) 


m=0 n=0 


where aj; is the output (activation) on coordinates (i,j), 
Xi+mj+n is the value of the input tensor in coordinates (i + 
m,j + n), Wun is the coefficient of the filter in coordinates 
(m,n), b is the bias, and f(-) is a nonlinear activation function. 
e Pooling layer. This type of layers is frequently used between 
convolutional layers, since they reduce the amount of 
information that convolutional layers generate. There are 
several ways to implement a pooling layer, and one of the 
most used is the max-pooling. In this case, a non-overlapped 
window is applied to the input feature map, and the output is 
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FIGURE 3 | Schematic overview of the CNN-based proposed architectures. (A) Custom CNN. (B) Adaptation of pre-trained network to a 8-unit output. 


Multi Layer Perceptron 


the maximum value of the considered window. If we consider 
a square window, the output feature map shape will be reduced 
in a 1/m factor, where m is the dimension of the window. 

e Dense layer. After applying a certain combination of 
convolutional and pooling layers, a multilayer perceptron 
(MLP) can be applied afterwards. Therefore, it is necessary 
to create an unidimensional vector from the output of 
the convolutional network (flatten operation). As explained 
before, the output layer of the MLP will have one unit for each 
value the network should output. 


The proposed CNN-based architecture is shown in Figure 3A. 
Input data are tensors of shape (150 x 152 x 3) or (78 x 192 
x 1), depending on the tensor architecture evaluated. We used 
three convolutional layers with 32, 64 and 64 filters with (3 x 
3) size. ReLU activation function was used in both convolutional 
layers. Max-pooling is applied after each convolutional layer 
[(2,2) window size]. Final dense layers are composed by layers 
of (128, 64) units (ReLU activation function), while the output 
layer is a Softmax layer composed by 8 units, one for each label. 
Finally, we used dropout between dense layers (0.6 dropout rate) 
to avoid overfitting. 


2.3.2. Pre-trained CNN 

Finding the right parameters for a CNN (number and dimensions 
of layers, pooling, etc.) can be a very complicated task, and 
training this type of network is also very computationally 
expensive. To address this task, there are several models that were 
previously trained with a certain dataset that can be also used to 
solve similar problems. This approach is called transfer learning 
(Pan and Yang, 2009). 

In this study, we decided to adapt the DenseNet121 (DN121, 
Huang et al., 2016), InceptionResNetV2 (IRNV2, Szegedy 
et al., 2016), and Xception (Chollet, 2016) models to our 
problem. These models were trained with the ImageNet database 
(Russakovsky et al., 2015), comprised of 14 million images 
corresponding to 1,000 different classes. Therefore, since we want 
to obtain an output with the prediction for 8 cases instead of 
1,000, we added several fully connected layers to the output of 
the pre-trained models. This adaptation is showed in Figure 3B. 
Dense layers were composed of (128,64) units (ReLU activation 


function), while the output layer is a Softmax layer composed by 
8 units. 

Finally, it is essential to remark that these pre-trained models 
require 3-channel input tensors. Therefore, to use the previously 
described 1-channel tensor, we decided to repeat the 2D matrix 
twice to obtain a 3-channel tensor [(78 x 192 x 3) shape]. 


2.4. Performance Metrics 
To assess the performance of the implemented DL models, we 
used four different metrics: 


e Accuracy (Acc). It is measured as 


TP full 


Acc = 
= Total 


(12) 


where TP,,1; is the total number of well-classified drivers (true 
positives), and Total is the total number of drivers. 

e Cohen’s Kappa (x). It is a robust statistic used for rating 
reliability testing (McHugh, 2012), and is computed as 


Po — Pe 
1— Pe 


(13) 


. = 


where po is the relative agreement among raters (identical 
to accuracy), and pe the hypothetical probability of chance 
agreement. A score of 1 represents a perfect agreement 
between raters, and a score of 0 represents the agreement that 
can be expected from random chance. Scores less than 0 mean 
that there is less agreement than chance. 

e Sensitivity (or true positive rate, TPR). It is the proportion of 
positive samples for a class that are correctly classified. It is 
measured as: 


TP 


TPR = ———— 
TP + FN 


(14) 
where TP is the number of well-classified drivers for a certain 
class (true positives), and FN is the number of drivers that 
was wrongly classified to another class (false negatives). It 
is considered a measure of how well a test can identify 
true positives. 
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e Specificity (or true negative rate, TNR). It is the proportion of 
negative samples for a class that are correctly classified. It is 
measured as: 


TN 


TNR = ———— 
TN + FP 


(15) 
where TN is the number of drivers that did not belong to 
a certain class that were correctly classified (true negatives), 
and FP is the number of drivers that was classified as positive 
for the same class, but they correspond to another class (false 
positives). It is considered a measure of how well a test can 
identify true negatives. 


2.5. Experimental Set-Up 

Final data set is composed of input data tensors, x;, of shape (150 

x 152 x 3) or (78 x 192 x 1) [(78 x 192 x 3) in the case of the 

pre-trained models], obtained from 64 BSP electrodes, and their 

corresponding label y;, which can have values 0, 1,..., 7. 
Training and test sets were split using two 

different approaches: 


e Time independence of input data. In this case, each time 
instant is considered as an independent sample. The whole 
data set was then randomly split into training (80%) and 
test (20%) sets, using hold-out validation during the training 
process (20%). In this scenario, samples from each AF model 
are randomly distributed to the training, validation, and test 
sets, but training, validation, and test processes are carried 
out with completely different samples, i.e., samples from the 
training set are not used in the validation and test sets. 

e Division of AF models in consecutive blocks. To describe 
a more realistic scenario, we divided each AF computerized 
model in three independent blocks: training (first 80% of the 
length of the signals), hold-out validation (last 20% of the 
training set), and test (final 20% of the length of the signals). 
Therefore, these three blocks contain consecutive samples, i.e., 
the training process is carried out using the first block of time 
instants, and test is achieved with the last block of samples. 


On the other hand, we had to address imbalance of data. In 
our computerized models, there are several atrial areas where 
the driver only appears in one single AF computerized model, 
whereas other areas are more susceptible of containing a driver. 
Therefore, there are several atria regions that are over represented 
in the data set. To address this problem, we tried to weigh 
the classes accordingly to the probability of occurrence (class 
weight). It applies a higher penalization in wrong classifications 
in under-represented classes. 

Regarding the training process, we trained the different 
models for a maximum number of epochs of 1,000, with 
reduction in the learning rate and early stopping if there is 
no improvement in the loss value during a certain number 
of training epochs. Moreover, we use the model checkpoint 
technique to get the best model obtained during the training 
process according to a metric of interest (loss value for the 
custom CNN, and validation loss for the pre-trained CNN). 
Finally, in the case of the pre-trained models, we followed two 


different training approaches: training the dense layers and the 
last two convolutional layers (while freezing the rest of the CNN, 
partial training), and re-training the whole pre-trained network 
(full training). 


3. RESULTS 
3.1. Performance of CNN 


The first carried out experiment consisted on assessing the 
obtained performance with each tensor architecture. For this 
purpose, we trained the CNN architecture explained in Figure 3 
with 1-channel and 3-channel tensors. In both cases, tensors 
were obtained from the same BSP dataset. Therefore, we got two 
different CNN models to be compared. Moreover, to simulate a 
realistic scenario, we used noisy BSP (SNR = 20 dB). 

Table 1 shows the results considering each time instant as an 
independent sample and division in blocks. In the first case, we 
were able to correctly locate the 95 and 91% of drivers in the 
training and test sets, respectively (Cohen’s kappa of 0.92 and 
0.88) when using 3-channel tensors. Results obtained with the 
CNN model trained with 1-channel tensors were very similar, 
with an accuracy of 0.94 and 0.91 in training and test sets, 
respectively (Cohen’s kappa of 0.92 and 0.88). 

On the other hand, results for division in blocks were worse. 
In this scenario, accuracy was 0.97 and 0.49 in training and test 
sets when training with 3-channel tensors, respectively (Cohen’s 
kappa of 0.96 and 0.33). Metrics when training with 1-channel 
tensors were similar in the training set but slightly better in 
the test set (accuracy of 0.52 and Cohen’s kappa of 0.37), but 
worse compared with time independence approach. These results 
suggest that we are overfitting the model to the training set. 

Figure 4 shows the confusion matrix obtained in the test set 
for both training approaches and tensor types, while Table 2 
shows the accuracy, sensitivity and specificity obtained for each 
label when training with 1-channel tensors. In the case of time 
independence, accuracy ranged from 0.81 to 0.975, although it 
worsened to 0.43 in the case of the drivers located in the septum 
area (label 7, accuracy of 0.430). These results are justified by 
the imbalance of our dataset. Labels from 0 to 5, although their 
population are also imbalanced, are highly represented, but the 
number of samples with labels 6 and 7 is very low. Results for 
the CNN model trained with 3-channel tensors were slightly 
worse. Regarding sensitivity, the system is able to detect with 
high precision clinically relevant regions. Those areas where 
ECGI has less reconstruction capacity (like the septum area), the 
sensitivity is lower. However, it is essential to remark that both in 
clinical practice and in our database these regions present a lower 
probability of present drivers. Regarding specificity, obtained 
scores were always above 0.94. 

In the case of division in blocks, results were significantly 
worse. Highest accuracy was obtained for the label 3 (0.80), 
which is the one with the largest representation in the dataset. 
However, accuracy for the rest of labels ranged from 0.22 to 0.56, 
and near 0 for labels 6 and 7. The high temporal redundancy 
between consecutive signal samples could explain overfitting in 
this scenario, although the training set was previously shuffled 
before fitting the model. The low number of computerized 
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TABLE 1 | Performance metrics obtained with CNN assuming independent time instants and division in blocks. 


Independent time instants Division in blocks 
3-channel tensor 1-channel tensor 3-channel tensor 1-channel tensor 
Training accuracy 0.954 0.942 0.971 0.965 
Training Kappa 0.940 0.924 0.962 0.955 
Test accuracy 0.913 0.912 0.499 0.526 
Test Kappa 0.887 0.886 0.336 0.373 
A 3-channel tensor 1-channel tensor 
20000 
40 19 237 67 21 
ars 17500 
103 7485 47 1309 2 161 7616 43 1126 3 
p 15000 


32 118 2323 293 1 0 48 176 2243 300 0 0 


_ E 12500 
g 194 286 46 PAIRE) 14 2 g 238 242 41 IJe 2 1 
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— EFA 10000 
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13 913 5177 38 0 0 7 1100 5004 0 = 
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FIGURE 4 | Confusion matrices obtained in the test set (1-channel tensor) for time independence (A) and division in blocks (B). 


AF models makes also difficult to help the CNN model to the most represented atrial zone (label 3). Finally, the rate of TN 


generalize. Regarding sensitivity, results were also significantly is very high for every label, except for the label 3, with a specificity 
worse, although the system was able to get a sensitivity of 0.799 in of 0.635. 
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TABLE 2 | Accuracy, sensitivity, and specificity scores obtained with 1-channel CNN assuming time independence and division in blocks (by labels, test set). 


Atrial region (time independence) 


No driver (0) 2 3 4 5 6 T. 
Accuracy 0.951 0.851 0.810 0.975 0.959 0.813 0.620 0.430 
Sensitivity (TPR) 0.951 0.850 0.810 0.974 0.959 0.813 0.620 0.430 
Specificity (TNR) 0.987 0.989 0.998 0.948 0.968 0.992 0.999 0.999 
Atrial region (division in blocks) 
No driver (0) 2 3 4 5 6 7 
Accuracy 0.567 0.228 0.261 0.801 0.499 0.457 0.002 
Sensitivity (TPR) 0.569 0.231 0.268 0.799 0.494 0.457 ) 0.001 
Specificity (TNR) 0.935 0.918 0.993 0.635 0.918 0.962 1 0.998 
i Accuracy ko Cohen's Kappa 
[T] 
0.94 0.94 
0.8 4 0.84 
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FIGURE 5 | Performance of CNN models on noisy test signals when assuming time independence (T) and division in blocks (D). Partially trained CNN models are 


SNR (dB) 


3.2. Performance of Pre-trained CNN and 


Noise Robustness 

In this second experiment, we evaluated the possibility of using 
pre-trained CNN models for AF driver localization, instead of 
using a custom CNN architecture. Models were trained with 1- 
channel tensors obtained with noisy BSP (SNR = 20 dB). Since 
pre-trained models require 3-channel tensors, we repeated the 
tensor twice to obtain 3-channel tensors. The custom CNN model 
was trained with the original 1-channel tensor. 


Moreover, to assess the noise robustness of those DL models, 
we tested them with tensors whose associated BSP were corrupted 
with noise (SNR from 5 to 50 dB, in steps of 5 dB). This procedure 
was repeated 20 times in order to obtain test signals with different 
noise but same SNR. Then, the mean and SD of performance 
metrics were computed. 

Figure 5 shows the performance metrics obtained for the test 
set, assuming time independence and division in blocks, for 
each CNN model, whereas Table 3 shows the average accuracy 
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TABLE 3 | Average accuracy values obtained for the different fully trained networks and noisy PSD (SNR = 5, 10, 15 and 20 dB, test set). 


Independent time instants 


5 dB 10 dB 15 dB 20 dB 
Custom CNN 0.751 + 0.001 0.872 + 0.001 0.905 + 0.0005 0.912 + 0.0003 
Xception (f) 0.788 + 0.001 0.899 + 0.001 0.927 + 0.0007 0.933 + 0.0005 
InceptionResNetV2 (f) 0.802 + 0.001 0.904 + 0.001 0.929 + 0.0008 0.934 + 0.0003 
DenseNet121 (f) 0.799 + 0.001 0.902 + 0.001 0.927 + 0.0006 0.932 + 0.0005 

Division in blocks 

5 dB 10 dB 15 dB 20 dB 
Custom CNN 0.479 + 0.002 0.514 + 0.001 0.524 + 0.001 0.527 + 0.001 
Xception (f) 0.453 + 0.003 0.498 + 0.002 0.513 + 0.001 0.517 + 0.0006 
InceptionResNetV2 (f) 0.452 + 0.002 0.490 + 0.002 0.504 + 0.001 0.508 + 0.0006 
DenseNet121 (f) 0.437 + 0.002 0.477 + 0.001 0.496 + 0.001 0.503 + 0.001 


values for the custom CNN model and fully trained pre-trained 
models (SNR from 5 to 20 dB). In both cases, performance of 
pre-trained networks was significantly better when training the 
full network instead of performing a partial training (where the 
network was freezed except the last two convolutional layers 
and the fully connected ones). For time independence, average 
accuracy values obtained with fully trained pre-trained models 
ranged from 0.788 (5 dB) to 0.93 (20 dB upward), whereas 
the best accuracy value obtained for partially trained models 
was 0.90 (Xception model). Regarding average Cohen’s kappa 
metric, it ranged from 0.72 (5dB) to 0.91 (20 dB upward) 
for fully-trained models, while the highest obtained score for 
all partially trained pre-trained models was 0.87 (Xception 
model). 

On the other hand, average accuracy values obtained for 
the division in blocks approach were significantly worse for 
all the CNN models. The best average accuracy value was 
0.51 among the different pre-trained models (SNR = 20 
dB upward, Xception fully trained model), whereas the best 
overall accuracy value was 0.527 for the custom CNN model. 
Both are much lower values than the ones obtained for 
time independence. 

Regarding the performance when testing with much noisier 
signals, performance degraded for SNR = 5 dB, with average 
accuracy scores below 0.802 when assuming time independence 
(0.479 with division in blocks). However, it improved from SNR 
= 10 dB with average accuracy values over 0.87 in the time 
independence approach (in the division in blocks approach the 
best score for SNR = 10 dB was 0.51 for the custom CNN 
model). Additionally, SD values of the accuracy were always 
below 0.001, which suggests that the CNN models are robust to 
noise changes. 

Finally, performance metrics for fully trained pre-trained 
models were very similar between them when assuming time 
independence, outperforming the custom CNN model. However, 
in the case of division in blocks, the custom CNN outperformed 
fully trained pre-trained networks, although differences between 
these models were very small. 


4. DISCUSSION AND CONCLUSIONS 


The use of CNNs can help to identify target regions for 
ablation using body surface potential mapping, avoiding ECGI. 
The proposed method, which converts BSP into images, 
has been demonstrated to be accurate and robust to noise, 
ie the performance just degrades for very low values 
of SNR. 

Regarding the architecture of the tensors, the 1-channel 
tensor-based architecture was able to obtain more accurate 
results than the 3-channel one, both assuming time 
independence or division in blocks. In relation to the 
training process, the single-channel tensor-based CNN 
model required a higher number of epochs to be fitted 
than in the 3-channel one. However, since each epoch 
is slower in the 3-channel architecture (3D matrix), 
the final training process becomes faster when using 
single-channel tensors. 

Moreover, this methodology makes transfer learning very easy 
to apply, since it can be used to adapt much more complex 
pre-trained CNN models to a very specific task with promising 
results. However, results were significantly better when re- 
training the whole network (much slower procedure) than when 
training just the final layers. Therefore, using pre-trained models 
requires further research. 

Although this study showed very promising results, it 
has several limitations that should be taken into account. 
The first one is the size and balance of our dataset. It 
is composed by 130 sets of BSP, obtained with 10 torso 
geometries but from only 13 AF computerized models, so the 
number of represented propagation patterns is low. Moreover, 
the distribution of drivers across the seven defined atrial 
regions is not balanced, i.e. there are regions that are 
over-represented, and performance will be worse on under- 
represented regions. However, we have decided to use the 
proposed division in seven regions because it represents a 
clinical-based classification of the areas where the AF drivers 
are more commonly found and has been already clinically 
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used to guide AF ablation strategies demonstrating to have a 
clear clinical significance (Haissaguerre et al., 2014). Besides, 
the proposed classification method can be easily extrapolated 
to other atrial geometry divisions based on a higher number 
of smaller regions, being able to get a higher resolution in the 
driver classification. 

Another important problem to be faced is the training 
approach. In this study, we first trained the models with 
random samples that belongs to the 13 AF models available 
(time independence). However, in a real situation, the DL 
model will have to compute a prediction with a data set 
that the network has never seen. Therefore, we decided to 
train the models with a chunk of signals of each model, 
while testing with the rest. In this scenario, performance 
substantially worsened because of the high temporal redundancy 
between consecutive time instants and the low number of 
AF models. Both facts lead to overfit to the training set, 
even when using tools like dropout. Using RNNs could be 
useful to face the redundancy problem, since they are able to 
extract temporal characteristics of BSP that are omitted when 
using CNN. 

In a real scenario, assuming model independence (i.e., training 
with some AF models and testing with the rest) should be 
the way to go, but it will require the availability of a higher 
number of computerized models and propagation patterns. 
Finally, another point of future work will be applying this 
methodology with real patient data. However, nowadays there is 
a lack of gold standard for the identification of AF drivers, and 
there are not labeled clinical data that could help us to validate 
this methodology. 
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